grand challenge
Multimedia Verification Through Multi-Agent Deep Research Multimodal Large Language Models
Le, Huy Hoan, Nguyen, Van Sy Thinh, Dang, Thi Le Chi, Nguyen, Vo Thanh Khang, Nguyen, Truong Thanh Hung, Cao, Hung
This paper presents our submission to the ACMMM25 - Grand Challenge on Multimedia Verification. We developed a multi-agent verification system that combines Multimodal Large Language Models (MLLMs) with specialized verification tools to detect multimedia misinformation. Our system operates through six stages: raw data processing, planning, information extraction, deep research, evidence collection, and report generation. The core Deep Researcher Agent employs four tools: reverse image search, metadata analysis, fact-checking databases, and verified news processing that extracts spatial, temporal, attribution, and motivational context. We demonstrate our approach on a challenge dataset sample involving complex multimedia content. Our system successfully verified content authenticity, extracted precise geolocation and timing information, and traced source attribution across multiple platforms, effectively addressing real-world multimedia verification scenarios.
- Europe > Ukraine > Dnipropetrovsk Oblast > Dnipro (0.06)
- Asia > Vietnam (0.05)
- North America > Canada > New Brunswick > York County > Fredericton (0.04)
- (2 more...)
- Information Technology > Security & Privacy (0.91)
- Media > News (0.70)
Preparing for the Intelligence Explosion
MacAskill, William, Moorhouse, Fin
AI that can accelerate research could drive a century of technological progress over just a few years. During such a period, new technological or political developments will raise consequential and hard-to-reverse decisions, in rapid succession. We call these developments grand challenges. These challenges include new weapons of mass destruction, AI-enabled autocracies, races to grab offworld resources, and digital beings worthy of moral consideration, as well as opportunities to dramatically improve quality of life and collective decision-making. We argue that these challenges cannot always be delegated to future AI systems, and suggest things we can do today to meaningfully improve our prospects. AGI preparedness is therefore not just about ensuring that advanced AI systems are aligned: we should be preparing, now, for the disorienting range of developments an intelligence explosion would bring.
- Europe > Russia (0.04)
- Asia > Russia (0.04)
- North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
- (7 more...)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine > Therapeutic Area (1.00)
- (7 more...)
Language Model Mapping in Multimodal Music Learning: A Grand Challenge Proposal
We have seen remarkable success in representation learning and language models (LMs) using deep neural networks. Many studies aim to build the underlying connections among different modalities via the alignment and mappings at the token or embedding level, but so far, most methods are very data-hungry, limiting their performance in domains such as music where paired data are less abundant. We argue that the embedding alignment is only at the surface level of multimodal alignment. In this paper, we propose a grand challenge of \textit{language model mapping} (LMM), i.e., how to map the essence implied in the LM of one domain to the LM of another domain under the assumption that LMs of different modalities are tracking the same underlying phenomena. We first introduce a basic setup of LMM, highlighting the goal to unveil a deeper aspect of cross-modal alignment as well as to achieve more sample-efficiency learning. We then discuss why music is an ideal domain in which to conduct LMM research. After that, we connect LMM in music with a more general and challenging scientific problem of \textit{learning to take actions based on both sensory input and abstract symbols}, and in the end, present an advanced version of the challenge problem setup.
- Media > Music (1.00)
- Leisure & Entertainment (1.00)
Grand Challenges in the Verification of Autonomous Systems
Leahy, Kevin, Asgari, Hamid, Dennis, Louise A., Feather, Martin S., Fisher, Michael, Ibanez-Guzman, Javier, Logan, Brian, Olszewska, Joanna I., Redfield, Signe
Autonomous systems use independent decision-making with only limited human intervention to accomplish goals in complex and unpredictable environments. As the autonomy technologies that underpin them continue to advance, these systems will find their way into an increasing number of applications in an ever wider range of settings. If we are to deploy them to perform safety-critical or mission-critical roles, it is imperative that we have justified confidence in their safe and correct operation. Verification is the process by which such confidence is established. However, autonomous systems pose challenges to existing verification practices. This paper highlights viewpoints of the Roadmap Working Group of the IEEE Robotics and Automation Society Technical Committee for Verification of Autonomous Systems, identifying these grand challenges, and providing a vision for future research efforts that will be needed to address them.
- North America > United States > Massachusetts > Worcester County > Worcester (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- North America > United States > California > Los Angeles County > Pasadena (0.04)
- (5 more...)
- Information Technology (1.00)
- Government > Regional Government (0.46)
Hyperspectral Reconstruction of Skin Through Fusion of Scattering Transform Features
Czaja, Wojciech, Emidih, Jeremiah, Kolstoe, Brandon, Spencer, Richard G.
Hyperspectral imagery (HSI) is an established technique with an array of applications, but its use is limited due to both practical and technical issues associated with spectral devices. The goal of the ICASSP 2024 'Hyper-Skin' Challenge is to extract skin HSI from matching RGB images and an infrared band. To address this problem we propose a model using features of the scattering transform - a type of convolutional neural network with predefined filters. Our model matches and inverts those features, rather than the pixel values, reducing the complexity of matching while grouping similar features together, resulting in an improved learning process.
- North America > United States > Maryland > Prince George's County > College Park (0.05)
- North America > United States > Maryland > Baltimore (0.05)
Overview of the L3DAS23 Challenge on Audio-Visual Extended Reality
Marinoni, Christian, Gramaccioni, Riccardo Fosco, Chen, Changan, Uncini, Aurelio, Comminiello, Danilo
The primary goal of the L3DAS23 Signal Processing Grand Challenge at ICASSP 2023 is to promote and support collaborative research on machine learning for 3D audio signal processing, with a specific emphasis on 3D speech enhancement and 3D Sound Event Localization and Detection in Extended Reality applications. As part of our latest competition, we provide a brand-new dataset, which maintains the same general characteristics of the L3DAS21 and L3DAS22 datasets, but with first-order Ambisonics recordings from multiple reverberant simulated environments. Moreover, we start exploring an audio-visual scenario by providing images of these environments, as perceived by the different microphone positions and orientations. We also propose updated baseline models for both tasks that can now support audio-image couples as input and a supporting API to replicate our results. Finally, we present the results of the participants. Further details about the challenge are available at https://www.l3das.com/icassp2023.
- North America > United States > Texas > Travis County > Austin (0.05)
- Europe > Italy > Lazio > Rome (0.05)
Description on IEEE ICME 2024 Grand Challenge: Semi-supervised Acoustic Scene Classification under Domain Shift
Bai, Jisheng, Wang, Mou, Liu, Haohe, Yin, Han, Jia, Yafei, Huang, Siwei, Du, Yutong, Zhang, Dongzhe, Plumbley, Mark D., Shi, Dongyuan, Gan, Woon-Seng, Rahardja, Susanto, Xiang, Bin, Chen, Jianfeng
Acoustic scene classification (ASC) is a crucial research problem in computational auditory scene analysis, and it aims to recognize the unique acoustic characteristics of an environment. One of the challenges of the ASC task is domain shift caused by a distribution gap between training and testing data. Since 2018, ASC challenges have focused on the generalization of ASC models across different recording devices. Although this task in recent years has achieved substantial progress in device generalization, the challenge of domain shift between different regions, involving characteristics such as time, space, culture, and language, remains insufficiently explored at present. In addition, considering the abundance of unlabeled acoustic scene data in the real world, it is important to study the possible ways to utilize these unlabelled data. Therefore, we introduce the task Semi-supervised Acoustic Scene Classification under Domain Shift in the ICME 2024 Grand Challenge. We encourage participants to innovate with semi-supervised learning techniques, aiming to develop more robust ASC models under domain shift.
ICASSP 2024 Speech Signal Improvement Challenge
Ristea, Nicolae Catalin, Saabas, Ando, Cutler, Ross, Naderi, Babak, Braun, Sebastian, Branets, Solomiya
The ICASSP 2024 Speech Signal Improvement Grand Challenge is intended to stimulate research in the area of improving the speech signal quality in communication systems. This marks our second challenge, building upon the success from the previous ICASSP 2023 Grand Challenge. We enhance the competition by introducing a dataset synthesizer, enabling all participating teams to start at a higher baseline, an objective metric for our extended P.804 tests, transcripts for the 2023 test set, and we add Word Accuracy (WAcc) as a metric. We evaluate a total of 13 systems in the real-time track and 11 systems in the non-real-time track using both subjective P.804 and objective Word Accuracy metrics.
- Information Technology > Communications (1.00)
- Information Technology > Artificial Intelligence > Speech (0.94)
Othello is Solved
The game of Othello is one of the world's most complex and popular games that has yet to be computationally solved. Othello has roughly ten octodecillion (10 to the 58th power) possible game records and ten octillion (10 to the 28th power) possible game positions. The challenge of solving Othello, determining the outcome of a game with no mistake made by either player, has long been a grand challenge in computer science. This paper announces a significant milestone: Othello is now solved. It is computationally proved that perfect play by both players lead to a draw. Strong Othello software has long been built using heuristically designed search techniques. Solving a game provides a solution that enables the software to play the game perfectly.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- North America > Canada > Alberta (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- (2 more...)
- Media > Theater (1.00)
- Leisure & Entertainment > Games > Othello (0.34)
Mapping the Challenges of HCI: An Application and Evaluation of ChatGPT and GPT-4 for Mining Insights at Scale
Oppenlaender, Jonas, Hämäläinen, Joonas
Large language models (LLMs), such as ChatGPT and GPT-4, are gaining wide-spread real world use. Yet, these LLMs are closed source, and little is known about their performance in real-world use cases. In this paper, we apply and evaluate the combination of ChatGPT and GPT-4 for the real-world task of mining insights from a text corpus in order to identify research challenges in the field of HCI. We extract 4,392 research challenges in over 100 topics from the 2023 CHI conference proceedings and visualize the research challenges for interactive exploration. We critically evaluate the LLMs on this practical task and conclude that the combination of ChatGPT and GPT-4 makes an excellent cost-efficient means for analyzing a text corpus at scale. Cost-efficiency is key for flexibly prototyping research ideas and analyzing text corpora from different perspectives, with implications for applying LLMs for mining insights in academia and practice.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > New York > New York County > New York City (0.05)
- (19 more...)
- Research Report > New Finding (1.00)
- Overview (1.00)
- Information Technology (1.00)
- Education (1.00)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.46)